filebeat: make deep copy before notifying of config change#42992
filebeat: make deep copy before notifying of config change#42992
Conversation
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
There was a problem hiding this comment.
LGTM. There is at least a test failing, but it seems some flakyness, you might retry it. If you want, you can also open a flaky test issue, that would be the best so we can track the flakyness
Btw, the test you added in the PR description, should it trigger the issue? I tried here and it passes with and without the change. So I don't really get the idea of that test.
|
@AndersonQ yes, you are right, I accidentally pasted the "fixed" test. Try the one below a few times. Just in case it's not clear, that test doesn't actually test the filebeat code, I copied the code from filebeat in a separate test to check the hypothesis that references to I think it's worth it to discuss better testing approaches for this case but I'd prefer to merge this bugfix separately. diff --git a/libbeat/cmd/instance/beat_test.go b/libbeat/cmd/instance/beat_test.go
index ebfecf191c..f9c290693e 100644
--- a/libbeat/cmd/instance/beat_test.go
+++ b/libbeat/cmd/instance/beat_test.go
@@ -15,14 +15,14 @@
// specific language governing permissions and limitations
// under the License.
-//go:build !integration
-
package instance
import (
"bytes"
+ es "github.com/elastic/beats/v7/libbeat/statestore/backend/es"
"io/ioutil"
"os"
+ "sync"
"testing"
"github.com/elastic/beats/v7/libbeat/cfgfile"
@@ -262,6 +262,69 @@ elasticsearch:
require.Same(t, c, m.cfg.Config)
require.False(t, b.isConnectionToOlderVersionAllowed(), "allow_older_versions flag should now be set to false")
})
+
+ t.Run("test tmp", func(t *testing.T) {
+ b, err := NewBeat("testbeat", "testidx", "0.9", false, nil)
+ require.NoError(t, err)
+
+ cfg := `
+elasticsearch:
+ hosts: ["https://127.0.0.1:9200"]
+ username: "elastic"
+ allow_older_versions: false
+`
+ c, err := config.NewConfigWithYAML([]byte(cfg), cfg)
+ require.NoError(t, err)
+ outCfg, err := c.Child("elasticsearch", -1)
+ require.NoError(t, err)
+
+ var wg sync.WaitGroup
+ defer wg.Wait()
+
+ const size = 100
+ wg.Add(size)
+ notifiers := make([]*es.Notifier, size)
+ for i := 0; i < size; i++ {
+ n := es.NewNotifier()
+ notifiers[i] = n
+ if i%2 == 0 {
+ n.Subscribe(func(c *config.C) {
+ defer wg.Done()
+ t.Log(c.FlattenedKeys())
+ })
+ } else {
+ n.Subscribe(func(c *config.C) {
+ defer wg.Done()
+ c.Merge(map[string]string{"test": "test"})
+ })
+ }
+ }
+
+ update := &reload.ConfigWithMeta{Config: c}
+ b.OutputConfigReloader = reload.ReloadableFunc(func(r *reload.ConfigWithMeta) error {
+ for i := 0; i < size; i++ {
+ outCfg := config.Namespace{}
+ if err := r.Config.Unpack(&outCfg); err != nil || outCfg.Name() != "elasticsearch" {
+ logp.Err("Failed to unpack the output config: %v", err)
+ return nil
+ }
+ notifiers[i].Notify(outCfg.Config())
+ }
+ return nil
+ })
+ m := &outputReloaderMock{}
+ reloader := b.makeOutputReloader(m)
+
+ require.False(t, b.Config.Output.IsSet(), "the output should not be set yet")
+ require.True(t, b.isConnectionToOlderVersionAllowed(), "allow_older_versions flag should be true from 8.11")
+
+ err = reloader.Reload(update)
+ require.NoError(t, err)
+ require.True(t, b.Config.Output.IsSet(), "now the output should be set")
+ require.Equal(t, outCfg, b.Config.Output.Config())
+ require.Same(t, c, m.cfg.Config)
+ require.False(t, b.isConnectionToOlderVersionAllowed(), "allow_older_versions flag should now be set to false")
+ })
}
type outputReloaderMock struct { |
|
This prevents concurrent read & write map access. Unrelated, but I've escalated one log line to Info to allow for easier verifying that ES store is being used from agent logs. Fixes #42815 (cherry picked from commit f1e42fc) # Conflicts: # filebeat/beater/filebeat.go # libbeat/statestore/backend/es/store.go # x-pack/filebeat/tests/integration/managerV2_test.go
This prevents concurrent read & write map access. Unrelated, but I've escalated one log line to Info to allow for easier verifying that ES store is being used from agent logs. Fixes #42815 (cherry picked from commit f1e42fc) # Conflicts: # filebeat/beater/filebeat.go # libbeat/statestore/backend/es/store.go # x-pack/filebeat/tests/integration/managerV2_test.go
This prevents concurrent read & write map access. Unrelated, but I've escalated one log line to Info to allow for easier verifying that ES store is being used from agent logs. Fixes #42815 (cherry picked from commit f1e42fc) # Conflicts: # filebeat/beater/filebeat.go # libbeat/statestore/backend/es/store.go # x-pack/filebeat/tests/integration/managerV2_test.go
…-packaging-steps * upstream/main: filebeat: make deep copy before notifying of config change (elastic#42992) [metricbeat] Add a new 'match_by_parent_instance' option to 'perfmon' module (elastic#43002) Don't package arm64 on amd64 workers (elastic#43026) [REVERT] Update Stack Monitoring data stream to 9 (elastic#43052) FIPS Build (elastic#42402) refactor: drop custom fsync implementation (elastic#42066) Update CHANGELOG.asciidoc docs: Prepare Changelog for 8.17.3 (elastic#42980) (elastic#43029) Fix boolean key in security pipelines and sync pipelines with integration. (elastic#43027) Skip test case for sequoia (elastic#42996) [main](backport elastic#42976) docs: Prepare Changelog for 8.16.5 (elastic#43005) x-pack/filebeat/input/entityanalytics/provider/activedirectory: do not consider computers to be users (elastic#42796)
Proposed commit message
This prevents concurrent read & write map access.
Unrelated, but I've escalated one log line to
Infoto allow for easier verifying that ES store is being used from agent logs.Fixes #42815
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesI have added an entry inCHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.How to test this PR locally
This is a blind spot: We have reproduced this on agentless deployments using abusech (see issue). However, I couldn't find a way to reproduce in an integration or unit test.
The closest is this hacky test:
however, that doesn't test any code to be commited
Related issues
Screenshots
Errors stop once the image with the fix is deployed
